A Novel Metadata Based Multi-Label Document Classification Technique
نویسندگان
چکیده
From the beginning, process of research and its publication is an ever-growing phenomenon with emergence web technologies, growth rate overwhelming. On a rough estimate, more than thirty thousand journals have been issuing around four million papers annually on average. Search engines, indexing services, digital libraries searching for such publications over web. Nevertheless, getting most relevant articles against user requests yet fantasy. It mainly because are not appropriately indexed based hierarchies granular subject classification. To overcome this issue, researchers striving to investigate new techniques classification especially, when complete article text available (a case non-open access articles). The proposed study aims multilabel metadata in best possible way assess, “to what extent metadata-based features can perform contrast content-based approaches.” In regard, novel investigating proposed, developed, evaluated as Title Keywords articles. technique has assessed two diverse datasets, namely, from Journal universal computer science (J.UCS) benchmark dataset comprises published by Association computing machinery (ACM). yields encouraging results state-of-the-art literature.
منابع مشابه
Novel Unsupervised Features for Czech Multi-label Document Classification
This paper deals with automatic multi-label document classification in the context of a real application for the Czech News Agency. The main goal of this work consists in proposing novel fully unsupervised features based on an unsupervised stemmer, Latent Dirichlet Allocation and semantic spaces (HAL and COALS). The proposed features are integrated into the document classification task. Another...
متن کاملWord Embeddings for Multi-label Document Classification
In this paper, we analyze and evaluate word embeddings for representation of longer texts in the multi-label document classification scenario. The embeddings are used in three convolutional neural network topologies. The experiments are realized on the Czech ČTK and English Reuters-21578 standard corpora. We compare the results of word2vec static and trainable embeddings with randomly initializ...
متن کاملMulti-label Document Classification in Czech
This paper deals with multi-label automatic document classification in the context of a real application for the Czech news agency. The main goal of this work is to compare and evaluate three most promising multi-label document classification approaches on a Czech language. We show that the simple method based on a meta-classifier proposes by Zhu at al. outperforms significantly the other appro...
متن کاملBoosting-based Multi-label Classification
Multi-label classification is a machine learning task that assumes that a data instance may be assigned with multiple number of class labels at the same time. Modelling of this problem has become an important research topic recently. This paper revokes AdaBoostSeq multi-label classification algorithm and examines it in order to check its robustness properties. It can be stated that AdaBoostSeq ...
متن کاملA Multilingual Polarity Classification Method using Multi-label Classification Technique Based on Corpus Analysis
In NTCIR-7 MOAT, we participated in four sub-tasks (opinion & holder detection, relevance judg-ment, and polarity classification) at two languagesides: Japanese and English. In this paper, we fo-cused on the feature selection and polarity classifi-cation methodology in both languages. To detectopinion and classify the polarity, the features wereselected based on a st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer systems science and engineering
سال: 2023
ISSN: ['0267-6192']
DOI: https://doi.org/10.32604/csse.2023.033844